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Abstract 

Our article presents a robust and flexible statistical modeling for the growth curve 
associated to the age-length relationship of Cardinalfish {Epigonus Crassicaudus). 
Specifically, we consider a non-linear regression model, in which the error distribu- 
tion allows heteroscedasticity and belongs to the family of scale mixture of the skew- 
normal (SMSN) distributions, thus eliminating the need to transform the dependent 
variable into many data sets. The SMSN is a tractable and flexible class of asymmet- 
ric heavy-tailed distributions that are useful for robust inference when the normality 
assumption for error distribution is questionable. Two well-known important mem- 
bers of this class are the proper skew-normal and skew-t distributions. In this work 
emphasis is given to the skew-t model. However, the proposed methodology can be 
adapted for each of the SMSN models with some basic changes. The present work is 
motivated by previous analysis about of Cardinalfish age, in which a maximum age 
of 15 years has been determined. Therefore, in this study we carry out the mentioned 
methodology over a data set that include a long-range of ages based on an otolith 
sample where the determined longevity is higher than 54 years. 

Key words: von Bertalanffy model, age-length, Cardinalfish, Heteroskedasticity, 
skew-t, influence. 



1 Introduction 

Currently, an increasing interest in describing the growth of biological species can be 
found amongst different studies. This has motivated the use of some biological models 
proposed in the literature to describe the growth associated with the age-length relation- 
ship. Among these works, the von Bertalannfy (VB) growth curve can be found (von 
Bertalanffy, 1938; see also Allen, 1966; Kimura, 1980; Gamito, 1998), which explains 
the length of a specie in terms of its age by means of a non-linear function depending 
on tree parameters {L^.K, to), where represents the asymptotic length of the specie 
under study, K is the growth rate (also known as the Brody growth coefficient of) and t 
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is the theoretical age at the zero length. Specifically, if y represents the observed length 
at age x, then a deterministic expression of the VB growth curve is given by 

y = L 00 (l-e- K ^- t ^). (1) 

In order to fit the equation ([T]i from an empirical dataset, (y t , xt), t = 1, n say, where 
y t (length) and x t (age) are the response and explanatory variables, respectively, the VB 
growth curve can be described in terms of a non-linear regression as 

Vt = nt + £u ( 2 ) 

t = l,...,n, where m = ri(p;x t ) = L^l - e -K(**-*o)), (3 = (L oo ,K,t ) T is the 
vector of unknown parameters and the e t are independent random errors. 

Kimura (1980) studied the relation Q under the assumptions of independence an 

normality for the random errors models, e t %r ^' N(0, a 2 ) say, and proposed the maximum 
likelihood method to fit the model (see also Allen, 1966). More recently, Cubillos et 
al. (2009) studied the VB model however using the Cope & Punt methodology (Cope 
& Punt, 2007), which considers a random error in assigning age, which is determined 
by two different readers. Although this last model also considers the independence and 
normality assumptions for the error terms, it assumes that the assigned age is determined 
by an exponential or gamma distribution, in this way guaranteeing a real age composition. 
In addition, to empirically studying the age-length relationship, Cubillos et al. (2009) 
considered samples of otolith of Cardinalfish obtained from 1998 to 2007 in the Chilean 
south central coastal zone (Latitude 33°S - 42°S), from which a random selection of 96 
otolith was obtained within the range of 20-37cm, and with ages of less than 15 years 
(Galvez et al., 2000). However, a new method to read otolith which is described in detail 
by Ojeda et al. (2010), formulates the hypothesis that Cardinalfish could have a longevity 
of at least of 54 years. Furthermore, this specie is characterized by living in waters from 
100 to 550m in depth, but generally between depths of 250 and 300m; and according to 
commercial capturing registers, lengths varying mostly between 17 and 47cm, which does 
not present significative differences in either sex (Wiff et al., 2005). 

In this paper, we study the VB growth model ([T]l considering a flexible class of non- 
normal distributions for the random error e t . Specifically, as in Basso et al. (2010), we 
consider the class of scale mixture of the skew-normal (SMSN) distributions (Branco & 
Dey, 2001) for random errors. The SMSN is an attractive class of asymmetric heavy-tailed 
distributions that are useful for robust inference when the normality assumption for error 
distribution is unrealistic. Some important members of this class are the skew-t, skew- 
slash, and skew-contaminated normal distributions (see, e.g., Lachos et al., 2010). The 
flexibility of these distributions allow to fit observations with a high presence of skewness 
and heavy tails, and they are useful to model some aleatory phenomenon with extreme 
values which generate residual heterogeneity in classical models (Kimura, 1990). To esti- 
mate the parameters of the SMSN models, Labra et al. (2012) implemented the expected 
conditional maximum estimation (ECME) algorithm of Liu & Rubin (1994). They incor- 
porated a correction to the EM algorithm to estimate the parameters of a non-linear model 
in a fast and more robust form. In addition, Labra et al. (2012) implemented some model 
comparison and diagnostic methods. Our study considers the local influence analysis of 
Cook (1986) and of Poon & Poon (1999). Our results are presented considering a dataset 
containing observations of species of at least 61 years old, so that it has a larger range of 
variability than that of the Galvez et al. (2000) study. Hence, the use of this dataset could 
strongly affect the estimations of parameters L^, K and to. 

This paper is organized as follows. Section 2 presents a description of the main theo- 
retical aspects of the methodology implemented by Labra et al. (2012). Section 3 shows 
the empirical behavior of lengths across different age categories without distinction of 
genre. This section also includes the main estimation and diagnostic results. Finally, Sec- 
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tion 4 concludes with a discussion of these results. 



2 Methodology 

In this section, we study the VB non-linear regression model using a similar approach to 
that used by Lachos et al. (2011), Labra et al. (2012) and Basso et al. (2010). Specifi- 
cally, we consider the non-linear regression model (|2]i with the assumption that the ran- 
dom errors e t are independent, heteroscedastic and distributed throughout SMSN class of 
distributions. In other words, we suppose that 

£t = v t et + (h, (3) 

t = 1 , . . . , n, where the e t and v t are independent random quantities and the p t are 
location parameters. More precisely, the e t are independent and heteroscedastic skew- 
normal random errors, et l ~' SN(0, a 2 , At) say, i.e., with density function 

h(e t ;a t ,X t ) = —<j) (—) $ [A t — J , -oo < e t < oo, 



0i \ cr t/ \ G t 

where a t > and — oo < X t < oo are scale and shape/skewness parameters, respec- 
tively, and 4>(z) and <f?(z) are, respectively, the density and distribution function of the 
standardized normal distribution. Meanwhile, in ([3]) the vt are positive (scale) random 
factors perturbing the skew-normality, which are assumed to be independent and identi- 
cally distributed (iid) with distribution function G(v; v) defined on (0, oo) and depending 
on the unknown parameter v (possibly vectorial). 

As we know, the mean and variance of the skew-normal random errors e t ~ SN(0, a 2 , A t ) 
are given by E(e t ) — Stat and Var(e t ) = {l — (2/w) <5f} of , where 5 t = 

At/^/l — Af . Thus, if we assume in that the moments Kk — E(v^ k ^ 2 ), k = 1,2, 
are finite, then the mean and variance of the SMSN random errors e t exist which are 
given by E(et) — \[2~pK K\o t 8t + [it and Var(e t ) = K2cf {l — (2/tt) 8f }. In order to 
have errors with zero mean, we impose the condition fi t = — \J2pK Ki<Jt§t- Under this 
condition, we then have for the mean and the variance of the response variable y t that 

E(yt)=Th and Var( yi ) = k 2 o\ jl - - ( 4 ) 

where r\ t = v(Pl x t) is the VB curve defined as in p|. 

On the other hand, we can also observe from (pF that given the scale mixture factors 
vt, the random errors e f have skew-normal distribution SN((it,v^ of, At) which are 
independent. Hence, we have from |2]i that conditionally on the Vt, the response variables 

y t have a distribution given byy t \vt ~ SN( Vt + fa, v t Vt, A t ), t = l ) ...,n, i.e., the 
marginal density of y t is 

2 f°° 

f(y t ;P,a t ,\ u v) = — / ^/vt4> (y/v~tz t ) $ (^/vt X t z t ) dG(v t ; v), (5) 

°i JO 

t = 1, n, where z t = (y t - % - fi t )/cr t , with fy t = »7(/3; x t ), ^ = Kxa t 8f 
In this work, we also assume that A t = A, so that St = 8, and = a 2 m(p; xt), where 
m(p; Xt) is a nonnegative function such that m(0; Xt) = 1. More precisely, we consider 
the function m(p; x t ) = x p t . Consequently, in |5]l the model parameters are given by 
13 = {Loo^K, i ) T , a 2 , A, p and v. 

The SMSN class of densities in <|3j provides different asymmetric heavy-tailed mod- 
els which are useful to obtain robust inference in the presence of influence observations 
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or outliers. The more well-known densities obtained from |5]l are the skew-i, skew-slash 
and skew-contaminated normal (for more details, see Lachos et al., 2010). All these dis- 
tributions contain the skew-normal one as special case. Also, for At = 0, the SMSN class 
reduces to the symmetric class of scale mixtures of normal distributions considered in 
Lange & Sinsheimer (1993). 



2.1 The skew-t special case 

In this section, we focus our attention principally on the skew-i model with v (y > 0) 
degrees of freedom (Branco and Dey, 2001; Azzalini and Capitanio, 2003; Arellano- Valle 
et al., 2012). This model follows by assuming in Q that the mixing random factors vt are 
iid Gamma{y /2, v/2), i.e., with density given by 

/ \ (V/2)"/ 2 „/ 2 -l -uvt/2 ^ n 

9{vt; v) = T (Ji 2 \ v t e > v t>°- 

In this case, we have ki = y/v/2T[(i/-l)/2]/T(v/2), v > 1, and k 2 = {v/2)T[(v- 
2)/2]/r(i//2) = vj(y — 2), v > 2. Also, in ^ we obtain the following skew-i density 
for the response variables y t : 



f(yf,P, of, A t , v) = ^- t(z t ; v)T yX t z t ^ ^j~2 > v + > ( 6 ) 
with zt, of and At defined as in |[5j, and where 

u , r[(l/ + l)/2] / Z 2X-(-+D/2 

r(zy/2) % /7r^ V v ) 

i.e., the symmetric Student- 1 density with v degrees of freedom, and T(z; v) denotes the 
corresponding Student-t distribution function. The skew-i (ST) contains the student-^ 
(T), skew-normal (SN) and normal (N) distributions as special cases, as is indicated in the 
following scheme 



ST 


— > SN — > 


N 




v->oa A=0 




ST 


— > T — > 


N 




A=0 v— >oo 




ST 


— > N 






v— >oo, A— 






{Loo,K, t ,a 2 ,p, A) T . 


, From 



degrees of freedom parameter v known, the ST log-likelihood function for 9 is thus given 
by 

n 

= $>(*)> (7) 
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where 



tie) = io g 2-iio g( T 2 -liogxf+iogr('^ 



- (|) log H - ^ 



2 y log 1 + V 



iogr(; 



with z t = (j/t - rjt - iit) 2 lo t , rj t = 1^(1 - e K{t * o) ), fj, t = Kia t S and of = 

For t/ unknown, an approximation to the maximum likelihood estimator can be com- 
puted by varying v in a grid of values and considering the value in the grid that maximizes 
the likelihood function as an estimator of v (Lange & Sinsheimer, 1993). 

The first and second derivatives of i±(9) are given in Basso et al. (2010) for arbitrary 
specifications of r\ t and of. From these results we can obtain the observed information 
matrix, namely 

n 

j(0) = -J2 J t(e), 
t=i 

where J t (9) = d 2 £ t (9) /89d9 T . Hence, the covariance matrix of the MLE 6 of 9 can 
be estimated by J(#) _1 , and the respective standard errors for the components of 9 by 
diag{ J{9)}^ 1 / 2 . Also, asymptotic confidence interval and hypothesis testing can be ob- 
tained assuming that 9 ~ N(9, J(9y 1 ). 



2.2 Influence diagnostic analysis 

Influence diagnostic techniques are used to detect observations that may produce exces- 
sive influence in the parameter estimates. There are two main approaches for such tech- 
niques: global influence, which is usually based on case deletion; and local influence, 
which introduces small perturbations in different components of the model. 

In this work, we consider the local influence analysis proposed by Cook (1986) to de- 
tect observations that exert great influence on the maximum likelihood estimators. Thus, 
we focalize our attention in the case-deletion or case-weight approach, in which the im- 
pact of deleting an observation on the estimators is assessed by means of the so called 
likelihood displacement defined by 

LD(oj) = 2{e(9)-e(9 u )}. 

Here, 9 and 9^ denote the MLEs of 9 under the unperturbed and perturbed models, re- 
spectively, and bj is a vector which represents the perturbation scheme, for example, a 
collection of case weights. For a given ojq, we have l{9u ) = 1(9) and so 9 LJQ = 9. 
In this sense, a graph of LD(uj) versus cj contains essential information on the influ- 
ence of the perturbation scheme in question. Cook (1986) called the geometric surface 
ip(cu) = [cj t , LD(uj)] t as influence graph. Also, to characterize the behavior of an influ- 
ence graph around cj , Cook (1986) defined the normal curvature of ip{oS) in the direction 
of a vector d of unit length as 

C d = 2\d T H T J- 1 Hd\, 

where J = J (9) = —d 2 l(9)/d9d9 T is the observed information matrix and H = 
d 2 £(9u)/d9duj T , which are evaluated at 9 = 9 and co = uj . The maximum curva- 
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ture occurs in the direction of rf max , the eigenvector associated to the largest eigenvalue 
of the matrix F = H T J~ 1 H. Hence, the vector d max gives information on the direction 
that LD(uj) shows more sensitivity. 

Since d is not invariant under a uniform change of scale, Poon and Poon (1999) 
proposed the conformal normal curvature given by 



d 2tr( J F T F) 1 /2' 

which is such that < Bd < 1 for any direction d. Thus, they propose to classify the 
ith observation as a possible influential observation if B dt is greater than the benchmark 

Cd = M + r^/var[M ] for a selected constant t depending on the observations, and 

— 1 " 

i ™ 

Var[M ] = -V(5 dt -M ) 2 . 

n — 1 : ' 



This should be interpreted as the effect of the tth deleted observation on the log-likelihood 
function. 



3 Results 

In recent years, several ages estimations for fish living in deep waters have been reeval- 
uated and in many cases maximum ages have been predicted to be drastically older than 
those previously considered (Cailliet & Andrews, 2008). This situation has been observed 
in Cardinalfish, where a previous age allocation process, by using the entire otolith sagitta, 
gives a maximum age of 15 years (Galvez et al., 2000; Cubillos et al., 2009). However, 
from a new analysis considering the transversal sections of these otolith, it was found that 
the Cardinalfish longevity is that of 54 years (Ojeda et al., 2010). Also, the longevity or 
life extension (average time between birth and death) are important variables that must be 
considered in controlling the exploitation of some species, since the longevity and growth 
rate are related directly to the natural mortality and productivity of the population of those 
species (Hewitt & Hoening 2005). 



o 




Longitudes (cm) 



o 

o -, 



o 




10 20 30 40 50 60 
Edad (anos) 



Figure 1 : Histograms of ages and lengths for Cardinalfish. 
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Table 1: Summary of descriptive statistics for Cardinalfish lengths and age categories. 



Ages 


Min. 


Max. 


Mean 


S.D 


N. obs. 


Proportion 


1-3 


13 


13 


13 


- 


1 


0.04% 


3-8 


12 


26 


19.218 


3.375 


165 


6.14% 


8 - 13 


19 


32 


24.317 


2.293 


309 


11.50% 


13 - 18 


21 


34 


27.069 


2.374 


261 


9.71% 


18-23 


24 


36 


30.203 


2.070 


197 


7.33% 


23-28 


26 


37 


32.323 


1.823 


269 


10.01% 


28-33 


30 


37 


33.305 


1.470 


456 


16.97% 


33-38 


30 


39 


33.781 


1.464 


410 


15.26% 


38-43 


31 


39 


34.102 


1.327 


333 


12.39% 


43-48 


32 


40 


34.549 


1.550 


184 


6.85% 


48-53 


32 


40 


34.461 


1.562 


76 


2.83% 


53-58 


31 


36 


34.381 


1.161 


21 


0.78% 


58-61 


33 


36 


34.8 


1.304 


5 


0.19% 


Total 


12 


40 


30.77 


4.87 


2687 


100% 



In that which follows, we apply the methodology on the VB model described above 
to analyze a real sample of 2687 Cardinalfish observations. A descriptive analysis of 
these data are summarized in Figure [T] and Table [T] All the statistical methods consid- 
ered in this study as well as the parameter estimations such as the ECME algorithm, 
variance-covariance matrix computation, and diagnostic tools have been computationally 
implemented in R software (R Development Core Team, 2012) in the skewtools pack- 
age developed by Contreras-Reyes (2012). Other methods implemented in R by Kahm 
et al. (2010) consider different conditions in order to derive a conclusive dose-response 
curve, for instance for a compound that potentially affects the growth curve; for example, 
length of the lag phase, maximal growth rate, and stationary phase. On the other hand, 
skewtools package assessing with the log-likelihood function, the distribution of the 
errors, and the influential analysis of observations. 

Figures [2] and [3] show the scatter-box-histogram plots for ages between 3 and 61 years 
for the studied specie, which are classified in 13 categories most of which have a length 
of 5 years as indicated in Table [T] Table [T] shows some descriptive statistics associated 
with the lengths of the species in these categories. As we can see from Figure [T] the 
empirical distribution for the lengths of younger subjects (aged between 3 and 38 years) 
is symmetric to light tails. Also, this distribution has its mode in the stretch of 28-33 years, 
which contains 456 observations. However, the distribution associated with the lengths 
of older subjects presents asymmetries and heavy tails. For example, for the 43-48 year 
category, we observe a high frequency of 184 species with lengths between 32 and 40 
cm. Moreover, if we consider the relative frequency in Table [T] and the ages histogram in 
Figure[T[ we again see a low number of observations for the 1-3, 48-53, 53-58 and 58-61 
year age categories. 

In this study we proceed to evaluate the performance of each of the models T and ST 
as follows (for N and SN models, the step 1 is omitted): 

1. For the estimation of the VB model parameters from the T and/or ST models, we 
need to find an estimate for the degrees of freedom parameter v that maximizes 
the corresponding log-likelihood functions. To do this, Lange & Sinsheimer (1993) 
recommend to compute profiles of the log-likelihood function for a given v, where 
the optimal v is given for the maximum profile log-likelihood function. 

2. Computed the parameter set 9 by the ECME algorithm, we proceed to diagnose 
the model associated with these parameter set by the local influential analysis. A 
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Figure 2: Histograms, Scatter-plots and Box-plots of lengths such age categories [3-8], 
(8-13], (13-18], (18-23], (23-28] and (28-33] for cardinalfish. 

re-estimation of the parameter set, 9 say, from the data without the influential 
observations is thus obtained. 

3. To evaluate the change produced by the influential observations (detected by using 
the local influence diagnostic analysis described previously) in the estimation of 
each component f3 k , k = 1,2, 3, of the vector of parameters /? associated to the VB 
curve, with respect to the estimated parameters including the outliers, we proceed 
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Ages: (33 - 33] 



Ages: (38 - 43] 



Figure 3: Histograms, Scatter-plots and Box-plots of lengths such age categories (33-38], 
(38-43], (43-48], (48-53], (53-58] and (58-61] for cardinalfish. 



to a confirmatory analysis via the percentaged relative change (RC): 



A)fc 



x 100%: 



(see, e.g., Bastos et al., 2012); where /3k is the estimation of /3& from the complete 
data, and (3ko is the corresponding estimation however obtained from the data with- 
out the influential observations. Therefore, the RC index represents the percentage 
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Table 2: T model fits for v = 2, 5, 45 considering the full data and the data without 
influential observations (IO). 



2 5 15 25 35 45 





r. 


34.897 


34.993 


35.075 


35.099 


35.111 


35.118 




SE(L ) 


0.060 


0.073 


0.082 


0.084 


0.086 


0.086 






0.087 


0.085 


0.084 


0.084 


0.083 


0.083 


g 


SE(A') 


0.001 


0.002 


0.002 


0.002 


0.002 


0.002 


Q 


t 


-2.727 


-2.858 


-2.968 


-3.002 


-3.019 


-3.029 


1 


SE(tn) 


0.201 


0.240 


0.269 


0.277 


0.280 


0.282 




N 


2687 


2687 


2687 


2687 


2687 


2687 




AIC 


10951.699 


10663.449 


10593.598 


10588.949 


10588.359 


10588.445 




i(e) 


-5470.849 


-5326.725 


-5291.799 


-5289.475 


-5289.180 


-5289.223 




Loo 


34.889 


34.976 


35.046 


35.065 


35.074 


35.079 




SE(L oc ) 


0.060 


0.072 


0.080 


0.083 


0.084 


0.084 




K 


0.087 


0.086 


0.085 


0.084 


0.084 


0.084 


O 


$E(K) 


0.001 


0.002 


0.002 


0.002 


0.002 


0.002 


D 

O 


to 


-2.692 


-2.781 


-2.848 


-2.864 


-2.872 


-2.876 


-G 


SE(io) 


0.199 


0.235 


0.263 


0.270 


0.273 


0.275 


i 


N 


2679 


2678 


2678 


2678 


2678 


2678 




i(e) 


-5434.569 


-5283.160 


-5244.821 


-5241.333 


-5240.446 


-5240.130 


o 
oi 


Loo 


0.023% 


0.05% 


0.082% 


0.096% 


0.11% 


0.11% 


K 


0% 


1.17% 


1.19% 


0% 


1.21% 


1.21% 




to 


1.28% 


2.69% 


4.04% 


4.59% 


4.87% 


5.05% 




Figure 4: (a) Profiles Log-Likelihood for T fit model. The dashed line corresponds to 
maximum log-likelihood at v = 37. (b) Profiles Log-Likelihood for ST fit model. The 
dashed line corresponds to maximum log-likelihood at v = 51. 



change for a parameter when we exclude the influential observations of the data set. 

4. We also compare the N, SN, T and ST models using the AIC (Akaike's information 
criterion) value given by Akaike (1974) defined as 

AlC(fl) = -2[l(9) - q] 
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Table 3: ST model fits for v = 2,5, 45 considering the full data and the data without 
influential observations (IO). 







2 


5 


15 


25 


35 


45 




L 


35.025 


35.038 


35.098 


35.119 


35.128 


35.134 




SE(L^) 


0.059 


0.069 


0.071 


0.07 


0.069 


0.068 




if 


0.086 


0.085 


0.084 


0.083 


0.083 


0.083 




SE(iT) 


0.001 


0.002 


0.002 


0.002 


0.002 


0.001 


Q 


f u 


-2.896 


-2.937 


-3.020 


-3.049 


-3.063 


-3.071 


1 


SE(i ) 


0.199 


0.230 


0.236 


0.231 


0.227 


0.225 




iV 


2687 


2687 


2687 


2687 


2687 


2687 




AIC 


10947.539 


10659.820 


10589.693 


10584.585 


10583.607 


10583.392 




£(£>) 


-5467.769 


-5323.910 


-5288.846 


-5286.293 


-5285.803 


-5285.696 




Loo 


35.033 


35.050 


35.094 


35.093 


35.090 


35.095 




SE(L X ) 


0.059 


0.069 


0.071 


0.071 


0.071 


0.07 




K 


0.086 


0.084 


0.084 


0.084 


0.084 


0.084 


O 


SE(K) 


0.001 


0.002 


0.002 


0.002 


0.002 


0.002 


3 

O 


to 


-2.922 


-3.016 


-2.907 


-2.889 


-2.876 


-2.882 


-C 


SE(to) 


0.202 


0.231 


0.233 


0.232 


0.233 


0.231 


% 


iV 


2677 


2684 


2680 


2678 


2677 


2677 




£(0) 


-5439.118 


-5314.756 


-5258.536 


-5244.444 


-5236.042 


-5235.694 




Loo 


0.02% 


0.03% 


0.01% 


0.07% 


0.11% 


0.11% 


K 


0% 


1.19% 


0% 


1.19% 


1.19% 


1.19% 




to 


0.89% 


2.62% 


3.89% 


5.54% 


6.50% 


6.56% 




Figure 5: Left: Cardinalfish observations. The thick line corresponds to the VB fit using 
ST distribution for v = 51 and the dashed corresponds to confidence intervals at 5% 
significance level. Right: log \JVar [y t ] values for N and ST distributions. 

associated to parameters set 8 where q is the number of parameters. 

Tables |2]and[3]present the fit of models T and ST, respectively, for v = 2,5, 15, ...,45. 
Table |4] summarizes and compares the T and ST model fits for v = 37 and v = 51, re- 
spectively. As shown in Figure [4] these values for the degrees of freedom maximize the 
corresponding profiles of log-likelihood functions. All of these tables include estimates 
of the VB parameters with their respective standard errors and the AIC and log-likelihood 
values for both, the full and the filtered (by influential observations) samples. Note from 
Figure|4]that considering the full sample, the T log-likelihood is maximized when v = 37. 
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Table 4: Summary of T (with v = 37) and ST (with v = 51) model fits considering the 
full data and the data without influential observations (IO). 





Complete Data 


Without IO 


Parameter 


1 


S 1 


1 


SI 


T 


35.113 


35.137 


35.095 


35.097 




(U.UoO) 


(U.Uuo ) 


(U.Uoj ) 


i A A~7\ 

(0.0/) 


K 
I\ 


(J.Uoi 


A AO*3 

0.083 


A AO A 


A AO A 




/a f\f\ r )\ 

(0.002) 


/A AA1 \ 

(0.001) 


/A AAO \ 

(0.002) 


/A AA 1 \ 

(0.001) 


J- 

to 








-Z.oo^t 




(0.281) 


(0.224) 


(0.277) 


(0.23) 


P 


-0.690 


-0.705 


-0.688 


-0.703 


a 2 


25.961 


38.087 


25.359 


34.868 


A 




0.873 




0.755 


V 


37 


51 


37 


51 


K\ 


1.021 


1.015 


1.021 


1.015 


K 2 


1.057 


1.041 


1.057 


1.041 


1(6) 


-5289.176 


-5285.687 


-5248.69 


-5235.583 


n 


2687 


2687 


2680 


2677 


AIC 


10588.35 


10583.37 


10507.38 


10483.17 



Considering this value for the degrees of freedom and the sample filtered by influential 
observations, we obtain the estimations = 35.095, K — 0.084 and t = —2.943. 
Results are obtained similarly for the ST model, where the ST log-likelihood is maxi- 
mized when v = 51. For this value of v, the estimations of the VB parameters under 
the filtered sample are L x = 35.097, K — 0.084 and to — —2.884. Consequently, we 
could consider the estimations obtained for v = 37 as a selection threshold for the VB 
parameters for the T case, and the results obtained for v = 51 under the ST case. As was 
mentioned in Section 1, these results could be significantly modified by the incorporation 
of the missing data in the 1-3 age category, particularly the estimation of to and the criteria 
for influence data and model selection. 

Now, we describe the main results of our analysis for the estimated parameters without 
influential observations. Thus, from the T model fit with v = 37 we have p = —0.69 and 
er 2 = 25.36. Also, for the estimation of the parameters Ki and K2 associated with the 
Gamma(v /2, u/2) mixing distribution, we obtain n\ = 1.02 and «2 = 1.06. These 
estimations are approximately equal to the corresponding parameter for the N and SN 
distributions, for which «i = «2 = 1- While from the ST model fit with v = 51, we have 
p = -0.703, a 2 = 34.87 and A = 0.39. For the N case we also find that p = -0.68 and 
a 2 = 26.93. Figure 5 shows the confidence bands {rj t ± Z( 1 _ ct / 2 ) \/^ ar [^t]} obtained 
from the ST case, ana the behavior of the VB growth curve and its respective variances 
represented by logy^Var [r} t ]. Here, Z(i_ a /2) denote the standardized normal quartile 
related to a significance level given by a and r\ t — E(y t ) is given in Q. 

4 Discussion 

With the aim of comparing the different distributions in this study, Table [5] shows the 
estimations for the VB vector of parameters (Loo, K, to) and its respective standard devi- 
ations obtained from the N, T, SN and ST models. These models are fitted using the full 
sample data and a sample data filtered by influential observations from the local influence 
criteria described in Section [272] From the obtained results, we can not see a perceptible 
difference in the estimation of (Loo, K, to) through the four fitted models. However, the 
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Figure 6: Plots of Influence Analysis of Cook for the four distributions, the cases T and ST 
assume v = 37 and v = 51 degrees of freedom, respectively. The dotted line corresponds 
to the decision level Cd, the Index axis corresponds to the observation index and M ( ) 
to the curve value Mq(9). The enumerated lines correspond to the observation indexes 
where Mq(0) > Cd is accomplished. 



standard errors associated to these parameters are lesser for ST model. This fact occurs 
mainly due to the prevalence of missing observations in the range of 1 and 3 years, which 
increments l^ar[y t ] related with these years. Also, in Table [5] we present the AIC crite- 
ria to compare the fit of these four models considering the filtered sample data. As was 
previously mentioned, the N and SN model-fits detect a larger quantity of influential ob- 
servations (54 and 50, respectively). In addition, we observe that the ST fit has a smaller 
value for the AIC criteria. Moreover, for this model the VB estimated parameters are 
= 35.097, K — 0.084 and t = —2.884 considering the dataset without influential 
observations. Note that for the four model fits, the RC in the estimation of and K ob- 
tained from the model fit with filter and full sample is of 0.11% and 1.21%, respectively; 
nevertheless, the estimation of to presents despairs changes because the ST distribution 
presents the larger variation (6.21%). 

Results on a diagnostic analysis are illustrated in Figura[6] The Cook's diagnostic 
measure tends to be sensible under structural changes of the covariance matrix associated 
to the parameter estimators based on the T and ST models; specifically, for changes in 
the parameter v (Labra et al., 2012). Note from Table [5] that the standard errors (SE) 
of the estimates obtained from the ST model tend to be smallest with respect to those 
obtained from the other models. In fact, we can see that the reference level Cd for the T 
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Table 5: Estimations and standard errors (SE) for full data and for the VB growth curve 
parameters (3 T — (L^, if, to) associated to normal (N), t-Student (T), skew-normal (SN) 
and skew-t (ST) fits. The T and ST cases assume v — 37 and v — 51 degrees of free- 
dom, respectively. The notations /3 and SE represent the corresponding estimations and 
standard errors with the dataset without influential observations. 







Loo 


K 


to 


n 


q 




AIC 




P 


35.147 


0.083 


-3.072 


2687 


5 


-5291.7 


10591.4 




SE 


0.089 


0.002 


0.290 


_ 


- 


- 


- 


N 


ft 


35.103 


0.084 


-2.922 


2633 


5 








SE 


0.085 


0.002 


0.269 


- 


- 


- 


- 




RC 


0.1% 


1.2% 


5% 


2.01% 










P 


35.113 


0.083 


-3.021 


2687 


5 


-5289.18 


10588.35 




SE 


0.086 


0.002 


0.281 


_ 


_ 


_ 


_ 


T 


ft 


35.095 


0.084 


-2.943 


2680 


5 








SE 


0.085 


0.002 


0.277 












RC 


0.05% 


1.21% 


2.58% 


0.26% 










P 


35.147 


0.083 


-3.071 


2687 


6 


-5290.7 


10593.4 




SE 


0.089 


0.002 


0.290 










SN 


A) 


35.109 


0.084 


-2.905 


2637 


6 








SE 


0.085 


0.002 


0.271 












RC 


0.1% 


1.2% 


5.4% 


1.86% 










P 


35.137 


0.083 


-3.075 


2687 


6 


-5285.69 


10583.37 




SE 


0.085 


0.002 


0.227 










ST 


ft 


35.097 


0.084 


-2.884 


2677 


6 








SE 


0.07 


0.001 


0.23 












RC 


0.11% 


1.21% 


6.21% 


0.37% 









and ST cases is around to 0.004, while that for the N and SN cases this level is around 
to 0.002. This fact produces that, under the N or SN models, several observations are 
identified as influential; while under the T and ST assumptions, they are considered not 
influential data. Specifically, 54 (2.01%) influential observations are identified for the N 
model, 50 (1.86%) for the SN model, 7 (0.26%) for the T model and 10 (0.37%) for the 
ST model. Numerical simulations carried out by Lachos et al. (2011) showed that for 
similar parameter values, the T and ST distributions effectively allow a lower number of 
influential observations than that of the N and SN distributions. Our results show that the 
optimal degrees of freedom parameters are large for the T and ST model fits. This should 
be interpreted as an approximation of the T and ST model fits to the N and SN model 
fits, respectively. However, the ST model fit produces the estimation with the smallest 
standard errors and gives a minor number of influential observations, given some features 
of robustness of the maximum likelihood estimation under T and ST distributions (Labra 
et al., 2012). 

The estimations of the VB parameters obtained by Cubillos et al. (2009) coincide with 
those obtained by Galvez & Rebolledo (2011), who report = 46.8, K = 0.147 and 
t a = 0. However, the methodologies adopted by Galvez & Rebolledo (2011) consider an 
exponential relationship between the otolith mass and the length, which should produce 
some errors related to age assign (Ojeda et al., 2010). On the other hand, we have consid- 
ered observations associated to older subjects but with missing observations of younger 
subjects (Ojeda et al., 2010), because the younger fraction of length less than 20 cm 
does not appear in the catch (Galvez & Rebolledo, 2011; Wiff et al., 2005). This should 
produce some differences in the parameter estimations with respect to that presented by 
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Cubillos et al. (2009), who estimated a positive value for to and an overestimation and un- 
derestimation for K of the asymptotic length L x , respectively. Our negative estimation 
of to is produced mainly by the missing observations in the first age category (1-3], where 
this should be solved by the use of back-calculation (see e.g. Francis, 1990, and the refer- 
ences therein) or improve in techniques of otolith readings to infer its length at an earlier 
time or times. Consequently, considering that the values for the VB parameters have an 
impact on the mortality of Cardinalfish (Hewitt & Hoenig, 2005), these differences lead 
to a totally different exploitation scenario. 

Different distributions have been presented in this study with the aim of giving new es- 
timation and diagnostic tools related to methods that assume gaussian residuals in growth 
models. Since the normality assumption is questionable when used to analyze these ob- 
servations, the flexible class of SMSN distributions provides robust models to estimate the 
parameters of the VB growth curve and thus determines appropriate mortality indexes. In 
addition, we have included an analysis of local influence, which allows the identification 
of anomalous observations, using the t-Student and skew-t distributions. Given the special 
features of the Cardinalfish -such as longevity- our proposal allows us to find differences 
in the estimation of the VB parameters respect to the results presented in the literature. 
Finally, the proposed methodology could be used to analyze the growth features of other 
species and for other growth models. 
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